Skip to content

Fix Qwen3 MoE identity LoRA export layout#688

Merged
FurtherAI merged 2 commits into
mainfrom
austin/qwen3_moe_lora_codec
May 21, 2026
Merged

Fix Qwen3 MoE identity LoRA export layout#688
FurtherAI merged 2 commits into
mainfrom
austin/qwen3_moe_lora_codec

Conversation

@FurtherAI
Copy link
Copy Markdown
Collaborator

Summary

Fixes Qwen3 MoE step-0 identity LoRA normalization so the identity adapter is exported in the same per-expert Qwen3 MoE layout as trained checkpoints.

Qwen3 MoE identity adapters are initially created through PEFT target-parameter LoRA, which produces fused expert keys like:

  • mlp.experts.base_layer.lora_A/B
  • mlp.experts.lora_A/B

ART now expands those Qwen3 MoE identity tensors into the vLLM/Megatron-compatible per-expert layout:

  • mlp.experts.{expert}.gate_proj.lora_A/B
  • mlp.experts.{expert}.up_proj.lora_A/B
  • mlp.experts.{expert}.down_proj.lora_A/B

This only adds a Qwen3 MoE to_vllm_lora_tensors conversion path. Trained Qwen3 MoE adapters that are already per-expert pass through unchanged.

Also adds experts to Qwen3 MoE default target modules so vLLM wraps the routed MoE FusedMoE layer, while preserving gate_proj, up_proj, and down_proj for Megatron's per-expert LoRA wrapping.

Validation

  • uv run --extra megatron --group dev pytest -q tests/integration/megatron/lora/test_lora_disk_codecs.py -k "qwen3_fused_identity or qwen3_dense_and_moe"
    • 2 passed, 5 deselected
  • yes_no_trainability workflow for Qwen/Qwen3-30B-A3B-Instruct-2507
    • passed
    • initial eval reward: 0.5
    • final eval reward: 0.96875
    • saturated step: 2
    • train grad norms: 66.91, 61.89
  • Confirmed vLLM loaded step @0, @1, and @2 adapters.
  • Confirmed checkpoints 0000, 0001, and 0002 use per-expert Qwen3 MoE keys with no fused base_layer expert keys.
  • Confirmed vllm==0.19.0 parser accepts the same per-expert Qwen3 MoE format.

@FurtherAI FurtherAI marked this pull request as ready for review May 21, 2026 17:39
@Kovbo Kovbo self-requested a review May 21, 2026 18:21
@FurtherAI FurtherAI merged commit 80a66de into main May 21, 2026
5 checks passed
@FurtherAI FurtherAI deleted the austin/qwen3_moe_lora_codec branch May 21, 2026 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants